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ATITS LAUNCH in 1999, the Text Creation Partnership was a breakthrough 
collaboration between libraries and commercial publishers of digitized material. It 
aimed to provide collections of electronic texts from the early modern period that were 
freely accessible to the public, transcribed to a high degree of accuracy, and encoded to 
enable re-use and analysis. Its initial impetus was the collaboration with ProQuest’s 
Early English Books Online (EEBO), but other collaborations were established with 
Readex’s Evans Early American Imprints and Gale’s Eighteenth Century Collections 
Online (ECCO). The TCP website provides a good history of its projects, and Shawn 
Martin’s 2009 essay “A Universal Humanities Digital Library: Pipe Dream or 
Prospective Future?” offers useful background as well as reflects on the possibilities 
and challenges of the TCP project as whole. However, the EEBO-TCP collaboration 
has generated most scholarly commentary (see, for example, Welzenbach, 2012; Mak, 
2014; Mueller, 2018; Gavin 2019; Herman, 2020). This interest reflects a number of 
factors peculiar to the success and visibility of EEBO-TCP. One factor was that, on 
its publication in 1999, EEBO consisted only of page images; in transcribing these 
page images, the TCP provided the text that enabled subsequent computational 
analysis and electronic text editing. The other significant factor was that the large 
number of texts transcribed—currently now around 65,000 texts—enabled the 
development of several large-scale projects for exploring and analysing the literature, 
language, and print culture of the period, for example, ‘The Early Print Library, 
PRISMS, Visualizing English Print, the Early Modern OCR Project (eMOP), and 
Linguistic DNA. 


In contrast—and although it is also used in several of the projects just mentioned 
—few analyses focus on the history of the TCP collaboration with ECCO. 
Consequently, unanswered questions remain about the nature of ECCO-TCP which 
this short essay aims to answer. Why did ECCO-TCP stop after a relatively small 
number of texts were transcribed? What organisational pressures and individual 
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addition—and aside from academic articles like this—how do we find the answers to 
such questions? As Roopika Risam has argued, “the reification of canons in digital 
form is not only a function of what is there—what gets digitalised and thus 
represented in the digital cultural record—but also Aow it is there—how those who 
have created their projects are presenting their subjects” (17). In short, how are such 
digital collections contextualised and their histories framed? 

The scale of ECCO-TCP is relativity small compared to the larger and arguably 
more successful EEBO-TCP. Initial expectations for ECCO-TCP were high: 10,000 
texts were planned to be transcribed.t However, between 2004 and 2012 only 3,101 
texts were eventually transcribed and encoded, comprising 2,473 fully edited texts, and 
628 released without being subject to final proofing and editing.* So, why did work 
stop? As I have suggested elsewhere, financial factors impinged on the sustainability of 
ECCO-TCP (75-76). The TCP is funded according to a “quasi-commercial model” 
in which libraries and institutions that purchased EEBO, Evans Early American 
Imprints, or ECCO could become contributing partners with the TCP; these funds 
were then matched by the commercial publishers, ProQuest, Readex, or Gale (Martin, 
4). However, in 2006 TCP’s executive board predicted budget deficits and sought to 
secure more funding from its partner institutions (““[T'CP Executive Board”). Paul 
Schaffner, director of the TCP, recalled that, “we never received the financial support 
that we hoped for” and at some time after 2009, “we ran out of money” and the 
ECCO-TCP project used “what was left to review and complete the books in the 
pipeline” (Schaffner). By 2012, these financial constraints prevented ECCO-TCP 
from populating its site with additional transcribed and encoded texts. 

The other problem that seemed to have sapped the energy behind the ECCO- 
TCP project was the question of its very nature. First, what exactly were the benefits 
of transcribing material from ECCO? What did the project hope to achieve? As 
mentioned earlier, TCP’s collaboration with ProQuest’s EEBO responded to a vital 
need and had a rigorous rationale; namely, it provided the searchable text which 
EEBO lacked. However, ECCO already had searchable text, produced by OCR 
software. Of course, it is the accuracy of text transcriptions which underpin any digital 
scholarship that uses the TCP collections. One of TCP’s missions was to “Present the 
user with accurately keyed, modern-font texts that are faithful to the spellings and 
organization of the original works.” ECCO’s notoriously messy OCR-produced text, 
though, rendered this objective impossible (Gregg 62-66).Nevertheless, T'CP’s 
mission was complicated by the sheer size of ECCO and which clearly presented a 
huge challenge: what criteria would be used to select texts that would benefit from 
transcription from over 180,000 titles? 

ECCO-TCP, like all human artefacts of collecting, is a product of institutional 
and human choices. Martin Mueller describes it as “a cherry-picked collection with an 
emphasis on canonical high-culture texts.” But how did it become that way? The 
geographic and linguistic biases of ECCO itself undoubtedly shaped its bias towards 
canonical authors (Tolonen, et al. 22-27). To a significant extent, this legacy can be 
traced to the foundations of ECCO: the microfilming project which tended to favour 
canonical male authors and the Anglocentrism of the originary 18" Short Title 
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Catalogue begun in 1976 (Gregg, 12-13, 23).° In this context, the criteria established 
by a TCP “selection task force” set up in August 2005 is illuminating: 
i ECCO-TCP will use the New Cambridge Bibliography of English 
Literature as a guide to begin the selection process, because this standard 
reference work is by no means confined in scope to ‘literature,’ but provides a 
good overview of writing of all kinds — philosophical, religious, travel, 
periodical, historical, and so on. 
2. ECCO-TCP will supplement these selections with suggestions from 
scholars, anthologies, and other bibliographies 
3. Titles in languages other than English normally will be excluded from 
selection in ECCO-TCP. 
4. ECCO-TCP will also, as far as possible, try to include works that will 
benefit from the added value the project brings (titles with complex structures 
like encyclopedias and works with bad OCR) 
a: ECCO-TCP will include authors who cross the seventeenth and 
eighteenth centuries, such as Defoe and Swift, and will include their political, 
religious, and economic texts where appropriate in order to provide complete 
representation of these authors in the overall TCP collection.‘ 

Schaffner noted that, apart from the broad and ambitious aim of identifying 
“added value,” these criteria were largely workable (for example, non-fictional works 
by Defoe are very well represented, attribution questions aside). However, these 
guidelines resulted in an uneven set of texts: decisions were inevitably subject to 
institutional pressures and individual human choice. For example, the relatively good 
representation of medical texts and Irish-themed fiction reflect the demands of 
particular partner institutions; and Schaffner himself acknowledged that his own 
interest in hymn books probably resulted in the inclusion of Isaac Watts, Charles 
Wesley, and Philip Doddridge (Schaffner). Decisions about what to include were also 
influenced by the use of the New Cambridge Bibliography of English Literature volume 
2: 1660-1800, published in 1971 (!) and its definition of “Major” authors. So, there are 
no works of fiction by the popular early women writers such as Penelope Aubin, Eliza 
Haywood, or Delarivier Manley, but—as an instance of individual choice—twenty- 
two works by “Minor” novelist Samuel Jackson Pratt are included. It seems the 
selection task force must have argued for Olaudah Equiano’s Narrative to be 
transcribed for the collection since it is not listed in the New Cambridge bibliography, 
but works by other writers of the early black Atlantic, including James Albert Ukasaw 
Gronniosaw, Phillis Wheatley, Ignatius Sancho, or Ottobah Cugoano, were not 
selected. 

The challenge presented by the lack of a clear argument for the project, a wide- 
ranging set of criteria, and the scale of ECCO resulted in a conservative and 
idiosyncratic collection that seems to have reflected eighteenth-century scholarship as 
it stood in the late twentieth century. On top of that, the small scale of ECCO-TCP 
arguably magnifies ECCO’s own inherent biases. Such biases also have the potential 
to impact any research based on the projects mentioned earlier. Literary and historical 
canons change, of course, and it might seem that I have unduly fixated on the use of a 
1971 bibliography to decide in 2005 what texts were valuable for a digital collection. 
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But while the ECCO-TCP webpage acknowledges that it is “perhaps better described 
as a proof of concept than as a completed project,” it avoids detailing the various 
factors that have shaped the nature of the collection (“Text Creation Partnership”). 
That is, despite TCP’s laudable claim that “Our policies were imbued with a librarian’s 
attitude toward content: a resolve to prepare materials without agenda or bias, and 
with a view toward wide use and reuse,” this oversight remains. The larger point is that 
we need to understand the nature of these collections and their biases, and that— 
without users and researchers having to carry out some additional detective work—an 
explicit framing of the financial, institutional, and human contexts that shape how and 
why they are made is essential for a more nuanced understanding and use of such 
digital collections. 


Bath Spa University 


‘ Initial estimate courtesy of Jonathan Blaney. 


? Notably, Gale did not ingest the TCP transcriptions into ECCO. In contrast, the UK 
organisation Jisc, another partner of TCP, ingested ECCO-TCP texts in its Historical Texts 
platform in 2016 (“Developmental Roadmap"). 


* Relatedly, TCP itself is not without its racial and gendered dimensions, since transcription 
is outsourced to workers in the Global South. See Mattie Burkert. 


‘| obtained this unpublished “Selection Task Force Report” (9-10 August 2005) courtesy of 
Paul Schaffner. 
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